CDFA Annual Reports

State-level data

Data presented below found in reports listed here. I’ve done some additional processing to the dataset, which is documented in the nass.R script in this repository. I focus on a subset of crops below, which I’ve selected based on a number of metrics including the commodity rank in contribution to GDP, whether or not CA is the top national producer, US production rank, CA as the only producer natinoally, net water use, and jobs per acre foot of water used.1

The TOTAL VALUE variable is based on the value of quantity harvetsed for crops and the value of quantity marketed for livestock. Value for “Livestock” and “Dairy” come from a different table, where TOTAL VALUE refers to the gross value of commodities and services produced within a year. The variables AREA HARVESTED and CA SHARE were not available for “Livestock” and “Dairy,” though CA is consistently one of the top producers in these sectors.

“Dairy” includes “milk and cream.” “Alfalfa” includes “Hay, alfalfa and other.” “Cattle” includes all “meat animals.” Values for “Tomatoes” include “Tomatoes, fresh market” and “Tomatoes, processing.” Tomatoe shares of CA production in US market are for “Tomatoes, processing” since the aggregate “Tomatoes” category was not available. “Lettuce” includes “Lettuce, Romaine,” “Lettuce, Head,” and “Lettuce, Leaf.” Shares of CA production in US market for “Lettuce” is for “Lettuce, Leaf” as aggregate data was not available.

County-level data

County-annual datasets were found here. Average yields for the state of CA through time for each crop are plotted below. Note some extreme outliers, which are likely from data errors since it is fairly unreasonable to expect yields to vary that significantly across counties. In these plots, dots are counties (n=58) and the line is the Loess curve fit to the points with rows containing NAs removed from the analysis. Multiple naming conventions were used for each crop in the original NASS data (due to changes in coding system through time). To see aggregation of variables, see nass.R. This script also documents removal of outliers.

County-level yield

County-level acreage

Note that for several crops (almonds, pistachios), there are strong paths in the data. These are counties with exceptionally high acreage through time.

Here’s a summary plot of all counties:

County-level production (Tons)

County-level net value (USD)

County-level price-per unit (USD)

Exploring spatial variation in yields

County-level yield variation

Look at this website for plotting tips.

County-level harvested acreage variation

Notes and limitations

From Matt Yost, Agricultural Extension Expert, Utah State University

The consistency in the quality of the NASS survey data from county to county. How accurately did the growers in each county measure and report their yield? Also, was the proportion of good vs. not so good management by growers equivalent from county to county.. or did the same growers report their yield from year to year.

Many growers don’t measure yield very well. Yield monitoring is perhaps best with grain crops or anything else harvested with a combine because most new combines come equipped with yield monitors. However, these monitors have to be constantly calibrated to really produce accurate yield information. The old way is to measure mass/area of some of the crop coming off the field. Many farmers will do this if they are selling the crop. If it is used for feed on their own farm, then they often will not. Thus, the difference if producers selling/keeping crops within and among counties could cause difference.

Another factor is the environment differences within and among counties. For example, more growers with poor ground could have been surveyed in one county vs. more growers with good ground in a neighboring county.

Many farmers will have land in multiple counties. When they fill out their survey from NASS, I’m not sure how they report their information. If they are surveyed based on where they live or where the majority of their fields are. I’m sure that NASS probably handles this, but I’m not cognizant of their procedures.

So, in summary, I think a lot of it is probably not real, but rather an artifact of the fact that only relatively small portions of the land have reliable yield data. I usually don’t put a lot of stock in county-level yield estimates for the reasons stated above – maybe I’m too much of a critic. You might talk with someone at NASS to learn more about how they collect and analyze their data.

Some of it may be real (there can be large variation in a single field), but it’s really hard to know. You could take the CDL and add SSURGO soil data to identify which types of soil the crop was grown on in each county and then determine whether soil productivity differences were a factor.

Attached are some references that I thought of or found. I don’t think they really address what you are after though – casual factors of yield variation. I’ll pass on anything else if I think of it.

-Matt

Space-time patterns

Plotting change in yield spatially over the course of the drought (2008 - 2015) to come!

crop <- "Almonds"

# regression lines
df %>% filter(Crop.Name == crop, Year > 2007) %>%
  ggplot(., aes(y = Yield, x = Year, color = County, shape = County)) +
  geom_point() +
  geom_smooth(method = "lm", fill = NA) 
## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have
## 16. Consider specifying shapes manually if you must have them.
## Warning: Removed 80 rows containing missing values (geom_point).

# single county
#county <- "Butte"
#df %>% filter(Crop.Name == crop, County == county, Year > 2007) %>%
 # ggplot(., aes(x = Year, y = Yield)) +
  #geom_point() +
  #geom_smooth(method = "lm", fill = NA)
## Loading required package: sp
## 
## Attaching package: 'raster'
## The following object is masked from 'package:dplyr':
## 
##     select
## The following object is masked from 'package:tidyr':
## 
##     extract

PCA

library(stats)
#https://www.r-bloggers.com/computing-and-visualizing-pca-in-r/
#http://www.math.canterbury.ac.nz/~r.vale/PCA.pdf
#http://adegenet.r-forge.r-project.org/files/tutorial-spca.pdf
#http://www.tandfonline.com/doi/full/10.1080/00045608.2012.689236?scroll=top&needAccess=true

# read spatial PCA article

ly <- df %>% filter(Crop.Name == "Almonds") %>% mutate(ly = log(Yield))

  1. Documented in the top_crop_selection spreadsheet.